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Introduction 



Many economic time series are non- stationary, and analysts have to take account of that 
fact. A time series can be trend-stationary or have a random walk component (difference- 
stationarity). In the first case shocks arc temporary, whereas shocks to a random walk 
are permanent. For unit root tests wc refer to Dickey and Fuller (1979), Phillip (1979), 
Phillips and Perron (1988), Bierens (1997), and Breitung (2002). Often, in particular for 
financial data, the unit root hypothesis can not be rejected, and then we are interested to 
detect as soon as possible a change-point where the time series is affected by an additional 
deterministic drift term. The problem discussed in this article should not be mixed up with 
the so-called random walk hypothesis which addresses a different issue, namely whether 
future values are predictable using past values. For that problem we refer to French and 
Roll (1986), Fama and French (1988), Lo and MacKinlay (1988), Poterba and Summers 
(1988), and Jegadeesh (1991). 

An important property of a random walk is that there are stochastic trends which can be 
mixed up with deterministic trends. Nevertheless, a random walk, i.e., a stochastic trend 
can be overlayed by a deterministic trend component. Hence we study the problem to detect 
a nonparametric drift component in a random walk. We assume that the observations 
^N,!, yN,2, ■ ■ ■ , yN,N arrive sequentially and 

YN,n+l = YN^ri + mN,n + Un, n^l,...,N, N EN, 

where Un are i.i.d. innovations with E{un) = and < Var (m„) < oo. For the weak 
distributional limits presented in this paper the i.i.d. assumption can be relaxed by a 
weak condition discussed in detail in Section 1, which allows, e.g., for correlated time 
series with GARCH effects. To study asymptotic properties analytically, we will model the 
deterministic drift mjv,n more explicitly. However, the detection procedure will not depend 
on a specification of the alternative, but decides after each new observation whether to 
continue with observations or whether to stop and reject the null hypothesis of no drift. In 
any case we stop no later than after the A'^th observation, where N is done in advance. 

Whereas a posteriori methods aim at estimating consistently the time point where the mean 
changes and therefore employ data before and after the change point, sequential monitoring 
methods use only past and current data, aiming at the detection of a change as soon as 
possible. The a posteriori approach is well studied. For example, Kim and Hart (1998) 

propose a nonparametric approach to test for a change in a mean function when the data 
arc dependent. Predictive tests for structural change with unknown changepoint have been 
studied in Ghysels, Guay and Hall (1997). The analysis of multiple structural changes in 

2 



linear models has been discussed, e.g., in Bai and Perron (1998). Yakir, Krieger and PoUak 
(1990) use the data after the change for optimization. Huskova and Slaby (2001) studied 
nonparametric multiple change point detection based on kernel-weighted means similar as 
studied in this article. Kernel-weighted averages have also been discussed by Ferger (1994b, 
1994c, 1995, 1996) and Brodsky and Darkhovsky (1993, 2002), where the latter examines 
a posteori and monitoring procedures. Sequential monitoring procedures to control for the 
derivative of a process mean have been studied in Schmid and Steland (2000). Results for 
jump-preserving smoothers can be found in Chiu et al. (1998), Pawlak and Rafajlowicz 
(2000, 2001), Rue et al. (2002), Steland (2002c, 2004a, 2005a), and Pawlak, Rafajlowicz 
and Steland (2004). For the application of [/-statistics we refer to Ferger (1994a, 1997), 
Gombay and Horvath (1995), and Horvath and Huskova (2003). 

The contribution of the present paper is to study sequential smoothers to monitor ran- 
dom walks to detect deterministic drifts, and to contrast the results to former work about 
stationary processes (Steland 2004b, 2005b). Whereas in the stationary case the normed 
delay of the procedure converges to a deterministic constant, for a random walk the rele- 
vant (kernel-weighted) partial sums have a different convergence rate. Hence, the statistics 
have to be scaled appropriately to obtain well-defined limit distributions. Further, de- 
pending on the rate of convergence of the local alternative, we obtain a deterministic or 
stochastic limit under the alternative. As a by-product, we provide the asymptotic law 
of the Nadaraya- Watson estimator. Our approach via kernel-weighted sequential partial 
sum processes yields asymptotic results for both the classic fixed sample design and the 
sequential sampling design. Compared to classic nonparametric regression, the monitor- 
ing framework as suggested by Wald (1947), Siegmund (1985), Brodsky and Darkhovsky 
(1993), and many others, assumes observations at fixed time points. 

The paper is organized as follows. Section [1] discusses the random walk model with local 
drift and the proposed monitoring procedure. Section [2] gives a brief discussion of the 
asymptotics for a stationary AR(1) process. Section [3] provides the new results about 
the control statistic under the random walk model for both the null hypothesis and the 
alternative. We also discuss general time designs in Section HI which allow to thin a time 
series with respect to time. The results are applied in Section |S] to derive the related 
results for the sequential stopping procedures. Section [6] studies the question of optimal 
kernel choice. Finally, in Section [7] we study the accuracy of the asymptotic distributions 
by simulations. 
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1. Model, method, and assumptions 



We aim at detecting a nonparametric trend starting at a so-called change-point (break- 
point) in the presence of a pure random walk without drift. In this section we explain in 
detail the model, the proposed method and required assumptions. 

1.1. Non-stationary time series model. Assume the data Y'i,...,yAr, G N, arrive 
sequentially, 

YN,n+l = YN,n + mN,n + Mn, l<n<N,NeN, 

where is a sequence of innovation terms with E{un) = and common variance 

< cr^ < oo. We assume that the observation y„ is taken at time t^, where denotes a 
deterministic and ordered sequence of time points. For convenience, we assume = n G N. 
Generalizations to other designs are straightforward and discussed in subsection HI 

We will study a detection procedure which does not depend on a specification of the drift 
rriN^n- The null hypothesis (in-control model) is that mTv.n vanishes for all N,n ^ N, and in 
this case Yn^u = Yn. The alternative says that starting at a change-point tq specified below 
the mean changes. Our limit theorems work under the following sequence of alternative 
models (out-of-control models) for the drift term. We assume 

(1) mN,n = mo{[tn - tq]/hN)h'l^, n G N, h> 0, 
where h = Hn is a sequence of positive constants with 

(2) N/hN -> C e [l,oo), as N ^ oc. 

mo, called generic alternative, is a continuous function such that mo{t) = for t < 
and mo{t) > for t > 0. mo = corresponds to the null hypothesis. The function mo is 
given by nature and unknown to us. However, in many applications it may be possible to 
define, e.g., a worst-case scenario in terms of mo. Then our results can be used to assess 
the performance of the procedure under that scenario. /3 G (—1,0] is a tuning parameter 
which controls the rate of convergence. If mo(t) > 0, t G (0, if:*), for some t* > 0, then there 
is a change at time tg. tq is called change-point. In this paper we address the following two 
change-point models. 

Change-point model CPl: Having in mind applications where it is reasonable to assume 
that a change may occur at a fixed given date, e.g., when a firm publishes its balance sheet, 
it is assumed that tg = g G N is a fixed integer. Consequently, if mo does not vanish, for 
each fixed A^ there is a change, but the percentage of pre-change observations tends to 0, 
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as N tends to oo. It will turn out that in this case the asymptotic limit depends on the 
function mo, but not on the change-point. 

Change-point model CP2: This approach, which is well established in the literature, as- 
sumes that the change occurs after a fixed fraction of the data, i.e., 

tg = tNq = LiVt?J , for some 1} G (0, 1). 

Here and in the sequel we denote by [x\ the largest integer less or equal to x e R. Under 
this model the asymptotic limit will depend on the change-point parameter 1?, too. 

Remcirk 1.1. Let us briefly discuss our approach to define local alternatives nonparamet- 
rically more precisely. We may write mN,n — fnN{tn',P), if fnN{t]^) — mo([i — tq\/h)h^. 
Hence, since N/h — >■ C,, for each fixed t we have limjv-^00 "^Ar(^; — ?7t.o(0) = 0. Provided 
mo{t) is twice differentiahle at t — with mo(0) < oo, we have mj^{t\ — mQ(0)(i — 
tq)h^~^ + 0{h^~^). Thus, the underlying drift tends to zero at rate h^~^, point-wise. 

Although in this article we do not discuss the case of dependent innovations in detail, our 
results work under the following general assumption. 

Assumption (A): The stationary sequence {«„} ensures that the partial sum process 
Ui, r e [0, 1], converges weakly to scaled Brownian motion, (TB{r), for some 
constant < cr < 00 which is determined by cr^ — limjv-^-oo E{N~^/'^ Yl!i=i '^iY- 

It is worth to discuss assumption (A). First, note that it covers weakly dependent inno- 
vations as arising in stationary ARMA or GARCH models, provided certain additional 
conditions arc fulfilled. In particular, Basrak, Davis and Mikosch (2003) have shown that 
GARCH(p,g) models, F„ = cr„e„, al = ao + ELi + 11%! f^j^^l-j^ where {e„} are 

1.1. d. with Een = and Eef^ = 1, are strictly stationary and strongly mixing with geometric 
rate, if > 0, + f^j < ^ E\n^ |ei| < 00, provided the series is started with its 
stationary distribution. For a general sufficient condition for (A) in terms of the a-mixing 
coefficients {a{k) : /c G N} of {un} we refer to Herrndorf (1985), which in particular yields 
(A) provided there exists some 5 > such that E\ui\'^^^ < 00 and Yl'h=i'^i^y^'^~^^ < 
Finally, note that this assumption is often considered as a nonparametric definition of an 
7(0) process (e.g. Davidson, 2002). 

1.2. The monitoring procedure. We monitor the time series by a sequential kernel 
smoother 

n n 
frin^^ Kh{ti - tn)Yi I ^ Kh{ti - tn) 



which employs only past and current data. The associated kernel-weighted sequential par- 
tial sum process is defined as 



/i is a bandwidth parameter given in advance and Kh{z) = h^^K{z/h) the rescaled version 
of the smoothing kernel K. If K vanishes outside the interval [—1, 1], h equals the number of 
past observations used by the procedure. To obtain meaningful results, namely weak limits, 
under alternatives, it turns out that the smoothing parameter h should converge to oo, as 
—7- oo, i.e., h = h{N) f. It turns out that h{N) and the sequence appearing in the 
definition of the local alternatives should satisfy limAr_^oo h{N)/hiy = c for some constant 
c > 0. That constant can be absorbed in the unknown function mo. Thus, for simplicity we 
assume h{N) = h^- The asymptotic framework is parameterized in the maximum sample 
size — 7- oo under the condition 

Note that the random function mjv(o) is an element of the Skorokhod space D[0, 1], con- 
sisting of all right-continuous functions with left-hand limits. We will denote convergence 
in distribution of random variables and random vectors by -4. Weak convergence in the 
space D[0, 1] will be denoted by 

The time series is now monitored by the truncated stopping rule 

Sn = inf {1 <n<N: Tn{N) > c}, T„(A^) = c{h, N)mn, 

with inf = A^. Here c{h,N) is a scaling function to be chosen later, and Tn{N) is the 
rescaled sequential smoother. Note that Sn is the index of the first time point where 
the kernel smoother exceeds the threshold (critical value) c. The monitoring procedure is 
truncated, i.e., we stop monitoring at A^. Note that the definition of Sjy does not depend 
on any model specification of the alternative. 

Concerning the smoothing kernel we make the following assumption. 

(K) K is assumed to be a Lipschitz continuous probability density with mean and 
finite variance. Let L be the Lipschitz constant, i.e., 

\K{z,)-K{z2)\<L\z,-Z2\ 

holds true for all Zi,Z2 G M. 

For results under the alternative we need the following conditions. 



(M) mo is a piecewise continuous funtion. 
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(KM) For the function 

pX PS 

I{x) = I K{s — x) / mQ{r)drds, 
Jo Jo 



assume < oo for all x > 0, / G C(Rq ), K{o) ■ niQ has bounded variation, 

and there exists some x* > such that I{x*) > c. 

A nuisance-free procedure. It will turn out that the limiting distribution of m„ depends 
on the nuisance parameter a^. A simple candidate is the naive estimator 

1 " 

1=2 

where AYi = Yi — Yi^i, i = 2, . . . ,n. Recall that is consistent for o"^ under the null 
hypothesis, if {AF„} is a linear process, AYn = J^JL-oo'^j^n-j where {Zj} are i.i.d(0,?7^) 
with EZj < oo and coefficients satisfying J2'jL-oo l^il ^ ^ (Brockwell and Davis, 1991, 
Prop. 7.3.4). 

A better choice may be Gasser's estimator which is based on a local linear fitting procedure 
(Gasser et al, 1986.) Define the pseudo-residuals 

En = 0.5AF„_i + 0.5AF„+i - AF„ 

and note that Ee^ = ED'^{n, h) + {3/2)a^, where D{n, h) = (l/2)(m„_i — m„+m„+i— m„). 
By ([1]) D^{n,h) = 0{h'^^~'^), if /i — > oo, provided rriQ is twice continuously differentiable. 
This yields the following proposition. 

Proposition 1.1. If niQ is twice continuously differentiable, the estimator 

n-l 



n-l 



" 3(n-2) 



i=2 



is asymptotically unbiased, as h ^ oo, if {un} are i.i.d. with existing second moment. 



Thus, whereas the estimator tends to overestimate the variance, may produce more 
reliable estimates. A related estimator is Rice's estimator given by l/(2[n— 2]) Yl^=2 

Thus, we may use the asymptotically nuisance-free control statistic T*{N) = s~^Tn{N). 

where s„ is one of estimators discussed above. 
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2. ASYMPTOTICS FOR STATIONARY AR PROCESSES 



Before turning our attention to the random walk case, let us briefly discuss the situation for 
a stationary AR{1) process. The asymptotic behavior follows from general results obtained 
for stationary a-mixing sequences of innovations, but the resulting formulas are slightly 
different and less explicit. 

In this section we assume that liv,i, . . . , Yf^^j^^ are observations arriving sequentially and 

yN,n+l = 0YN,n + mN,n + Un, n^l,...,N, N eN, 

where the AR parameter a satisfies \a\ < 1, is an i.i.d. sequence of innovations with 
E{un) = and < Var = cr^ < oo. The deterministic drift component is given by 

mN,n = mo{[tn-tg]/h), 

with mo as in the introduction, but at this point we put ^ = 0. Note that — lAr,n) n 
is stationary under Hq. 

We have l^+i = X^i^o^*"^"-* + X^i^o "^^^^i-*' where X^^q^*"^"-* ^ stationary process 
with autocovariance function 

ro(fc)=aV/(l-a2), \k\eNo, 

thus being ct-mixing with geometric rate. 

Under the null hypothesis Hq : rrio — we may apply Theorem 3.1 of Steland (2004b) to 
obtain weak convergence at the usual rate A^^/^, i.e., 

(4) _A_^^(5)^M(5), inD[0,l], 

as A" — >■ oo, where M^(s) is a centered Gaussian process with correlation kernel given by 

Cor(Mc(s),M^(t)) = Q(s,t)/ (^(^ J^^ K{z - (s) dz J^^ K{z - (t) dz^, 
for < s < i < 1, with 

Cc(*' ^) = 1™ XI ^h{ti - tyNs\)Kh{tj - tyNs\) -, _ 2 " 
i=l j=l 

Due to the Lipschitz continuity of A', the sample paths of are continuous w.p. 1. Note 
that 



N 

I(mo) = lim \^ a-'mo([n — j]/h) < oo, 

j=0 



if / mKs) ds < oo. Now a similar argument as in Theorem 3.3 of Steland (2004b) shows 
that under the alternative the process in (jl]) diverges at the rate A^^/^, since 

h ^^^^ ' 
hN~^rfiM{s) = —^Kh{ti-tiNs\)'^a^rno{[n-j]/h) + op{l) 

i=l j=0 

= O (^/(mo) K{z - Cs) dz^ . 

These results have also immediate implications for the sequential stopping rules. If 

fic(s) = P— lim hN'-^rriNis), 
it can be shown that for any fixed < k < 1 

AT-i inf{[KA^J <n<N: hN~^''^mn > c} 4 inf{K < s < 1 : > c}, 

as N ^ oo, i.e., the normed delay converges to a deterministic quantity. 

3. ASYMPTOTICS FOR RANDOM WALKS 

Now we study the asymptotic behavior of the Nadaraya- Watson estimator m„ under the 
random walk model as introduced in Section [TJ Note that our asymptotic framework differs 
from the usual framework in nonparametric regression. We do not assume that the time 
points {ti} get dense in any finite time interval or are distributed according to a density, 
which ensures that we may let the bandwidth h tend to at a certain rate. Instead we 
assume a fixed time design taking account of the fact that time series are commonly 
observed at a fixed time scale. Thus, as a by-product we provide the asymptotic laws of 
the Nadaraya- Watson type smoothing under the sampling design of the present paper. We 
formulate the results for equidistant observations, i.e., t„ = n, and discuss more general 
time designs in Section |H 

The results of this section about the Nadaraya- Watson process mp^{s), s G [0,1], are 
preparations for the analysis of the stopping time S^, but since they are interesting in their 
own right we discuss them in detail here. In particular, the interesting relationship between 
the (qualitative) asymptotic behavior and the convergence rate of the local alternative are 
properties of that underlying process. 

3.1. Limit theory under the null hypothesis. We first study the asymptotic distribu- 
tions under the null hypothesis that we deal with a random walk without drift. The limit 
distributions are centered Gaussian processes and centered normal distributions, respec- 
tively. 
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Theorem 3.1. Assume (A) and (K). Under the null hypothesis Hq : itlq = Q we have 

C/„ A'(C(r-l))<ir 

as N ^ oo. The associated partial sum process converges weakly 



^/9^ . ^ .^ (T ("KiCir- s))Bir)dr , , 

- MAS) = ■ » ^[0, 1), 

as N ^ oo. The limit process is continuous w.p. 1. 

Observe that for a = 1 the hmit process A^((s) is distributed according to a N{^,a\) 
distribution with variance given by 



Kids - 1)) /; tKiCit - 1)) dt + s KiCit - 1)) dt 



ds 



{CloKiar-s))drY 

which can be calculated explicitly for any given kernel (Shorack and Wellner (1986), p. 
42). The following table provides some values of af^ = o"|,(l) for the Gaussian kernel, 
the Epanechnikov kernel given by KEpan{z) = (3/4)(l — z'^), for z G [—1,1], and the 
(standardized) Laplace kernel, which is defined by Kiap{z) = (l/-\/2)e~^l^l, z G M. 

Kernel C 

10 5 4 2 1.5 1.2 1 

Gaussian 0.0089 0.0310 0.0449 0.1242 0.1913 0.2754 0.3775 

Laplace 0.0089 0.0316 0.0463 0.1443 0.2310 0.3353 0.4578 

Epanechnikov 0.0095 0.0359 0.0545 0.1857 0.2921 0.3968 0.4857 

Table 1. Asymptotic variances for several choices of the kernel and ( = Mm N/h. 



Theorem 13.11 suggests the following confidence interval 

(5) rriN ± Zi_a/2aKh'^N^/'^ 

which has asymptotic coverage 1 — a under Hq. It can be used to perform a preliminary 

level a test given data Yi, . . . ,Yj^ before establishing a monitoring procedure. The accuracy 

of that procedure is studied to some extent in Section 6. However, comparing friNh with 

the confidence limits zi-a/20'Kh''^ N^^"^ does not ensure well-defined statistical properties 

of the associated stopping rule in terms of the average run length or the normed delay. 
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Remark 3.1. Note that the event Tn{N) = hN^f^rhn > c stands for a false alarm at the 
nth time point, if mo = 0. It is straightforward to show 

P{hN'^/^mn >c) = 0{h-^N''l^) = 0{N~^'^), 

i.e., in our framework the point-wise false-alarm rate tends to 0, as N ^ oo. 

3.2. Limit theory under local drifts. We will now investigate the asymptotic behavior 
under the (local) alternative model and both model specifications for the change-point. It 
turns out that the result depends qualitatively on the rate parameter /3 of the alternative. 
If (3 = —1/2, i.e., the alternative converges at the rate to the null model, we obtain 

a non-degenerate Gaussian limit with drift for the process /iA^~^/^mjv(s) studied in Theo- 
rem 13.11 under the null hypothesis. That process has a proper asymptotic null distribution. 
For a slowly converging alternative (/3 = 0) corresponding to the rate h~^, we have to 
change the scaling function to obtain a limit. In this case we obtain stochastic convergence 
to a non-stochastic function. That function determines the asymptotic detection properties 
of the proposed procedure. We formulate the results for the partial sum processes m7v(s), 
putting s = 1 yields the asymptotic laws of the Nadaraya- Watson estimator. 

Theorem 3.2. Assume (A), (K), (M), and (KM). Fix < a < 1. Under the alternative 
Hi : mo >* the following assertions hold true. 

(i) If [3 = —1/2, we have weakly in D[a, 1], 

h ^ j;K{ar-s))B{r)dr j; K^qr - s)) f,' mpit - C^lgp^) dt dr 

as N ^ oc. Here, lcP2 = if change-point model CPl holds, and lcP2 = 1 under 
model CP2. 
ill) If (3 = 0, then 

, . p IpKiCir - s)) mo(t - C^lgp^) dtdr 

as N ^ oo. Again, lcP2 = if change-point model CPl holds, and lcP2 = 1 under 
model CP2. 

Remark 3.2. Note that the asymptotic limit depends on the change-point parameter d 
if model CP2 holds. Under model CPl the limit is free of tq, which is a consequence of 
tq/h = o(l) and continuity of mo. 
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Remark 3.3. It is worth noting that procedures based on the partial sum process mAr(s) 
are able to detect a drift if the function 

fi^{s)= [ K{C{r-s)) f m^{t-C^lcP2)dtdr 
Jo Jo 

is positive for some interval of s-values. 

Remark 3.4. Note that (ii) implies that the statistic hN'^^'^fhi^ diverges under local al- 
ternatives corresponding to (3 = at the rate h^^"^ . 

4. General time designs 

Let us briefly discuss more general time designs for the choice of the time points t„. In 
some applications the following monitoring approach may be possible and reasonable. We 
monitor the process at equidistant time points 1,2,... until either the procedure provides 
a signal, or we have reached the time horizon (maximum sample size) N . Here we assume 
that the time unit is chosen appropriately, e.g., one day or one week. Intuitively, to detect 
a change as soon as possible it should be better to use more recent observations Fj, i.e. 
with tn — ti small, than past observations where t„ — ti is large. To some extent, this is 
achieved by the smoothing kernel, which downweights past data, but a real thinning of the 
data can only be achieved by an appropriate selection resp. design of the time points. This 
means, at the current time t„^„ = n one chooses past time points < t„^i, . . . , t„,„-i < in,n 
where observations are taken. This allows to start with monthly observations and use daily 
observations at the end of the (current) sample. We consider two different approaches 
corresponding to the two change-point models CPl and CP2. 

4.1. Generalized time designs for the CPl model. Assume that 
(6) tn,i = nFrp-^^i/n), i = l,...,n, n G N, 

where Ft is a continuously different iable d.f. with support [0, 1]. Clearly, if Ft is the d.f. 
of the uniform distribution on [0, 1], we obtain = i. Nonlinear choices of Ft allow to 
ensure that past or more recent observations dominate the sample. Note that F^^ defines 
a sampling scheme which is rolled over the time axis: At each time n the time points 
tn,i, . . . , tn,n-i are chosen according to the scheme Qj. 

Under model CPl, a Taylor expansion yields t^q = (Fj7^)'(0)g + o(l) provided F^T^ is 
continuously differentiable. Thus, if (Fj7^)'(0) > 0, the underlying (asymptotic) change- 
point equals (F^ ^)'(0)tg, whereas for (Ff ^)'(O) = the sequence of change-points vanishes 
asymptotically, i.e., the detection problem is made easier as n increases. 

12 



The associated Nadaraya- Watson process is given by 

ELi Kh{tiNsi,i - lNs\) 
where again \_Ns\ plays the role of the current time point. It is straightforward to check 
that the proofs of Theorem 13.11 and Theorem 13.21 still work. Now the limit process under 
the null hypothesis is given by 

The drift term appearing in Theorem 13.21 changes to 

Remark 4.1. In practice, it may he necessary to use the time point t* ^ G {t*^^i, . . . ^t*^^^ 
nearest to tn^i, where {tjj j} denotes the finest discrete time scale available. Then, ^ defines 
a selection rule for the time points {t*nj}- 

4.2. Generalized time designs for the CP2 model. It is easy to see that the gener- 
alized time design above makes not much sense under model CP2. One may consider the 
following modification, which is easier to apply, but lacks the authentic idea to allow for 
schemes which use more observations near each current time n. Assume 

(7) tN, = NF^\i/N), 2 = l,...,iV, 

where is a continuously different iable d.f. with support [0, 1]. Here, given the maximum 
sample size A^, the time design scheme is set up only once, i.e, the selected time points do 
not change with the current time n. Since under model CP2 the change-point is given by 
tq = tNq = [N'dl , we obtain 

tr,q = NF^\[m\/N) 

yielding t^jN ^ F^\^). This means, the (asymptotic) change-point parameter is trans- 
formed by Fj^^, and it appears in the asymptotic limit. The associated Nadaraya- Watson 
process is now defined by 



ElS'MN/h[Ff\i/N)-Ff\[N,i/N)]) 
A straightforward calculation shows that the drift term now changes to 

, ^ _ j; K{Cs[F^\r/0 - F^\s)]) J^' mo{C[F^\t/0 ~ F^\^/0])dtdr 

^''"'^"^ - /; KiCs[F^\r/0 - F^\s)]) dr ' ' ^ 
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Note that Remark [4. II also applies to the time design scheme ([7]). 



5. Sequential detection rules 



Let us now discuss the implications of the results of Section [3] for the stopping rule Sn = 
inf{0 < n < N : Tn > c}. Note that Sn can be written in terms of the sequential partial 
sum processes. Indeed, Sjy = N ■ inf{0 < s < 1 : c(/i, A^)mjv(s) > c}. For asymptotic 
results under local alternatives we also consider the stopping rule 



where a G (0, 1) is a fixed constant. Again notice that can be written as ■ inf{a < 
s < 1 : c{h, N)mNis) > c}. 

5.1. Limit theory under the null hypothesis. The following theorem provides the null 
distribution of the stopping rules. 

Theorem 5.1. Assume (A), (K), and Hq : itlq = (random walk without drift). 

(i) IfTn{s) = c{h, N)mn{s) with scaling factor c{h, N) = HN"^/"^, the normed stopping 
time Sjsf/N converges in distribution to the random variable 



(ii) The limiting laws of the nuisance-free versions correspond to the special case a = 1. 

These results can be used to choose the threshold (critical value) c from the asymptotic 
distribution. For example, we may simulate trajectories from the limiting processes and 
determine for each trajectory the smallest s such that the threshold c is exceeded. This 
gives an approximation of the distribution of Sn which can be used to choose c to ensure 
that, e.g., the average run length equals a prespecified value. 

5.2. Limit theory under local drifts. The following results summarize our findings 
under local alternatives and give interesting insights into the asymptotic properties of the 
procedure. In particular, we see how the smoothing kernel and the generic alternative mo 
jointly affect the performance of the procedures. 

Theorem 5.2. Assume (A), (K), (M), and (KM) (random walk with local drift). Fix 



ia) _ 



inf{[A^aJ <n<N: c{h, N)mN{n/N) > c} 



'N ~ 




as N oo. 



a G (0,1). 
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(i) Suppose (3 = —1/2. IfTn{s) = c{h, N)mN{s) with scaling factor c{h, N) = hN 
the normed stopping time S]^^ /N converges weakly to the random variable 

5(C) = inf{s G [a,l] : W^{s) > c}, 

where the stochastic process W(^{s) is given by 

^ aj;K{ar-s))B{r)dr K{ar - s)) m,{t - C^Icp2) dtdr 

as N ^ oo. 

(ii) Suppose (3 = 0. If Tn{s) = c{h, N)m]\f{s) with scaling factor c{h, N) = h^^'^N~^/'^, 
the normed stopping time S^^'^ /N converges in probability to the non- stochastic as- 
ymptotic normed delay 

I Jo KyCi'^ ~ s))dr 

as N ^ CO. 

This theorem says that the stopping rule relying on the control statistic hN~^^'^mN, which 
has a proper limit under Hq, has a nondegenerate limit distribution under local alternatives 
converging to at the rate If, however, we consider alternatives with rate h~^, which 

is the appropriate rate in the stationary case (see Steland, 2004b), and change the scaling 
function, we obtain a deterministic limit S*{(; K; mo), the asymptotic normed delay, as in 
the case of a stationary process. 



6. Optimal kernel choice 



Suppose the critical value c is a fixed constant chosen by the data analyst. For example, 
when analyzing a time series representing financial risk measured in terms of a currency 
unit, c may be a psychological price. Then Sn stands for the time point where that price 
is reached for the first time. 

Assuming the change point model CPl, Theorem 15.21 (ii) motivates to examine whether 
optimal kernels exist which minimize the asymptotic normed delay S*{(; K; mo) for a given 
alternative mo representing a worst case scenario. Recall that this deterministic quantity 
appears as the limit if the alternative model converges to the null model at the rate h~^, 
whereas for the faster rate we obtained a stochastic limit. From a practical viewpoint 
considering the conditions for a slower convergence to may provide a better approximation 
to reality. 

15 



First note that for a finite set of candidate kernels, {Ki, . . . , Km}, we can simply plot the 
M corresponding curves 

„ /X _ JoKi{ar-s))j;;mo{t)dtdr 

and use the kernel which provides the smallest s where the critical value c is exceeded. 
For the case of detecting a drift in a stationary process Steland (2002a) provides a real 
data analysis of credit risk data, where this simple procedure yields a detection rule which 
signals the change one time point earlier. For a Bayesian view on the problem of kernel 
optimization see Steland (2002b). 

Although we can provide a solution to the problem of optimal kernel choice, the results 
seem to be of limited practical use, since we can identify the optimal kernel only for a finite 
interval around 0. Nevertheless, from a theoretical point of view it is interesting to know 
that both the asymptotic normed delay and the optimal kernel can be calculated explicitly 
for any given generic alternative mo- 

Let /C denote a class of probability densities with expectation 0, which is uniformly Lips- 
chitz continuous, i.e., 

sup \K{zi) — K{z2)\ < L\zi — Z2\, ^z\,Z2 e R, 

holds for some constant L > 0. The problem is to find a kernel K* e /C such that the 
corresponding asymptotic normed delay, S'*(C; iC*; mo), satisfies 

5*(C; K*- mo) = inf{5*(C; K- mo) : K e K,}. 

Such a pair {K* , S*{(; K*; mo)) is called optimal. Using optimization techniques presented 
in detail in Steland (2004b), one can establish the following theorem which provides a way 
to calculate the optimal asymptotic normed delay and provides the optimal kernel K*. 

Theorem 6.1. Suppose that for all s G [0, 1] 

(^J mo{t)dt^ dr <oo. 
(i) The optimal asymptotic normed delay is given by 

S'((, K'; m„) = inf ^ e [0, 1] : ■^°'//° ""''l'*^' > c ; 

Jo Jo mo(t)dtdr 
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(ii) The optimal kernel K* satisfies 

K\z) = lf^y'^*'^"'''-'oit)dt 
2 Jq°° mo(t) dt dr 

for arguments z e [-C5'*(C; -ftT*; mo), C5'*(C; "^o)]- 



7. Simulations 



To study the accuracy of the asymptotic distributions of the detection procedures, we 
simulated random walks, {l^n}, where Iq = 0, and Yn+\ = Yn + Un with {«„} i.i.d. A^(0, cr^), 
(T = 1. To estimate the nuisance parameter cr^ we assumed that an additional prerun 
random walk of length /i = 10 was given. 

Figure 1 shows 20 realizations of the kernel- weighted sequential partial sum process, rh]\f{s), 
s G [0,1], for = 100 and h = 50 and its asymptotic approximation via the kernel- 
weighted integral over Brownian motion using ( = 2. The sequential detection procedure 
Sn can be visualized by drawing a horizontal line (control limit) at c. The first intersection 
of the process and the control limit is the run length. 

To study the accuracy of the asymptotic null distribution we performed simulations to 
assess the coverage of the confidence interval based on m^r and average run lengths (ARL) 
of the stopping rule Sn- We focus on the ARL, since it may the most common criterion 
to design monitoring procedures for practical applications. Note, however, that our results 
also allow to design procedures which control the type I error rate. 

Table [2] reports the simulated coverage probabilities of the confidence interval defined in (j5]) 
for a Gaussian kernel and a nominal coverage of 0.95 under the null hypothesis. The results 
for the Epanechnikov and Laplace kernel, respectively, were in close agreement and are not 
reported here. Each value is estimated by 10.000 repetitions. The asymptotic variance is 
estimated using the estimator and aj^ as given in Table [TJ It can be seen that even for 
h << N and small A^ coverage is good. 

In order to simplify the application of the proposed sequential monitoring procedure we 
provide curves to obtain approximate critical values to achieve a prespecified ARL, Eq^Sn), 
under the null hypothesis Hq : mo = 0. Figure [2] provides curves of the normed ARL 
ao = Eo{Sn)/N as a function of c, i.e., ao = ao(c). For given {N,h) use the curve for 
( ^ N/h and determine c graphically with Nc ~ ao{c). 
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C 10 50 100 250 500 



10 0.9502 0.9496 0.9523 0.9502 0.9471 

5 0.9481 0.9514 0.9478 0.9489 0.9534 

4 0.9475 0.9525 0.9474 0.9473 0.9515 

2 0.9408 0.9468 0.9480 0.9458 0.9518 

1.5 0.9350 0.9431 0.9516 0.9512 0.9485 

1.2 0.9320 0.9453 0.9523 0.9477 0.9518 

1 0.9301 0.9526 0.9470 0.9494 0.9504 



Table 2. Coverage probabilities of a 0.95-confidence interval for random walks. 



o.o 



0.2 



0.4 0.6 
Time 



0.8 



1 -O 



Figure 1. 20 realizations of the kernel- weighted sequential process 
'f^nh{s)-i s e [0, 1], (bold line) and its asymptotic limit (dashed line). 



How accurate is that approximation? To gain some insight we compared the asymptotic 
distribution of the stopping time 

5^ = inf{0 <s<l: / K{C{r - s))B{r) dr / / K(C(r - s)) dr > c} 
Jo Jo 
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0.05 0.10 0.15 0.20 0.25 

Threshold c 

Figure 2. Normed ARL curves for hN^^^'^rhnh using the Gaussian kernel. 
( attains the values 1 (bottom curve), 1.5, 2, 3, 4, 5, and 10 (top curve). 

with the true distribution of the normed stopping time 

S^f^/N = inf{l < n < : a-^hN-^/^mnh{s) > c}/N 

in terms of the ARL. Each ARL was approximated using 10, 000 trajectories. 

Figure E] provides the results. For h G {10, 20, 50, 100}, A^ = (h, and C = 3 (left panel) and 
( = 10 (right panel) the corresponding normed-ARL curves are shown. It can be seen that 
the curve representing the asymptotic critical values are below the simulated true curves. 
This means, the asymptotic critical values yield conservative procedures. The accuracy 
seems to be better for large values of (, i.e., if h is small compared to A^. 
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Figure 3. Normed ARL curves for the nuisance-free control statistic 
T*^ = a~^hN~^^'^mnh using the Gaussian kernel h takes on the values 
10, 20, 50, 100, = (h. Left panel: ( = 3. Right panel: ( = 10. The dashed 
curves represents normed ARLs of the asymptotic distribution. 



APPENDIX: PROOFS 



In this paper we work with weak convergence (denoted by of elements of the space 
{D[0, 1], d) where d is the Skorokhod metric. For treatments of the general theory we refer 
to Billingsley (1968), Pollard (1985), and Vaart and Wellner (1996). 

Proof (of Theorem \3. Put Iq = and define 

Xiv(r; s) = N~'^YiNr\Kh{tiNr\ - tiNs\), r,s e [0, 1]. 

Note that X^ir; s) is a constant on the intervals [j^, ^) with value N~^YiKh{ti — tLAr^j), 
i = 1, . . . , N . Therefore, the area under the curve Xiy{r; s), r G [0, s], is given by 



/ XN{r;s)dr = — ^Kh{ti-tiNs\)yi- 

"^0 i=l 

Using Fltv.j = we have hN'/'Xr,{r; s) = ^ Z\=i^ ■ K (^tozVfi). Since 



by assumption (A) the partial sum process Ui converges weakly to scaled 

Brownian motion aB{r), we may apply the a.s. representation theorem of Skorokhod and 
Dudley (Pollard (1984), p. 71) which ensures that there exist versions of the random 
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elements which converge a.s. in the supnorm. This imphes 



LA'oiJ 



-;= UiK 

1=1 



h 



(75(0i)ir(C[0i - 02]) ^0, 

£•([0,1] X [0,1]) 

which proves weak convergence hN^/'^Xjq{r] s) =^ aK{C{r — s))B{r) in /^([0, 1] x [0,1]). 
By continuity of the process aK{C{r — s))B{r), (s,r) G [0, 1] x [0, 1], has continuous 
and bounded sample paths w.p. 1. Consider the integral operator I which maps an element 
/ G D{[0,1] X [0,1]) to the element /(/) G D[0,1] given by /(/)(s) = f{r,s)dr, s G 
[0, 1]. If (/„) C D{[0, 1] X [0, 1]) is a convergent sequence with limit / G C{[0, 1] x [0, 1]), 
i.e., d{fn, f) — > 0, as n — 7- 00, then we also have \\fn — f\\oc — 0, n — )• 00, yielding 
||-^(/ri) — -^(/)||oo — 0, as n — 7- 00, i.e., continuity of /. Hence, the continuous mapping 
theorem yields 

[iVoiJ 



h 



J2 Kh{U-t^No])Y, = J(/iiVi/2X^(o2;oi))(oi) 



i=l 



a / K(C(r-oi))5(r)rfr, 



weakly in D[0, 1], as iV — 00. Since additionally. 



(8) 



Vir^(t,-tL^,j)^ r K{r-Cs)dr = C /"V(C( 

JQ Jo 



r — s)) dr, 



as — )■ 00, the assertions follow. 

Proof (of Theorem \3.2^) . A random walk with non- vanishing drift, y„+i = F„ + rrinh + Un, 
can be decomposed as Yn = Yn + Yl'^Zl f^sh, n G N, where Yn = Yl^=i '^s is a random walk 
based on the innovations m„ without drift. Hence, 

[Ns\ 

hN-'/^Y.^h{U-tiMsi)Y, 

i=l 

can be decomposed as 

[Ns\ [Ns\ i-l 

1=1 i=l j=l 

For the first term one may argue as in the proof of Theorem 13.11 to verify that 

(9) ^^^^^ _ ^^^^^^Y, ^ a / KiCir - s))B{r) dr, 

i=i 
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as — )■ oo. Further, since m„ = mo([ti — tq]/h)h^ , (3 = —1/2 implies 

[Ns\ i-l 



^^(^) = i;T^^Khiti-tiNs\)^moi[tj -tg]/h)h^ 



i=i j=i 



^ /V(C(r-s)) r m,{t-(:dlcP2)dtdr, 
, Jo Jo 



as AT ^ oo, by (K) and (KM) uniformly in s G [a, 1] (cf. Steland 2004b, Th. 3.3 (ii)). 
Combining this fact with ([9]) and ([8]) yields 

h ^ J^K{ar-s))B{r)dr C'/'J^K{ar-s))J^^'mo{t-C^lcP2)dtdt 



iV3/2 "'^^^ Cj^K{C{r-s))dr Cj^ - s)) dr 

in D[a, 1], as — 7- oo. In contrast, if /3 = we obtain convergence to a deterministic 
quantity, if we change the scaling factor from hN~^^'^ to h^/'^N~'^^'^ . Indeed, in this case we 
have 

^1/2 L^^J 

^ KhiU - tlNsi)Y^ = Op(l), 

i=l 

uniformly in s G [a, 1], and for the centering term 

^3/2 L^^J i-l 

h-'/'Ms) = j;^h-^J2^i^t^-hNsi]/h)J2^o{[tj-t,]/h) 

i=i j=i 

-Cr 



yielding 



/ K{C{r-s)) moit-Ci31cP2)dtdr, 
Jo Jo 



P /; KjCir - s)) J^' mo(t - C^Ic^p^) dt dr 

Ar3/2"^^^^^^ Cj^Kiar-s))dr 

uniformly in s G [a, 1], as — )■ oo. 

Proof (of Theorem \5.1\ and \5.^) . We verify Theorem 15.11 (i), i.e., assuming /3 = — 1/2 and 
c{h, N) = hN"^^"^. The other assertions are shown along these lines. Fix < a < 1. By 
Theorem 132] (i) the process c{h, A^)mAr(s) converges weakly in D[a, 1] to the non-stationary 
and a.s. continuous process 

aJ^K{ar-s))B{r)dr C'/' J^' K{r - (s) j; mojt) dt dr 
Ci;Kiar-s))dr + Ci;K{C{r-s))dr 

as N ^ oo. Define the functional ipa '■ D[0, 1] — )■ D[0, 1], 

Mf) = mf{a <s<l: f{s) > c}, / G D[a, 1]. 
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Clearly, ^a\£'c is continuous w.r.t. || o and c?, where £^c = {/ ^ C'[0, 1] : /(a;*) > 
c for some a;*}. By (K) and (M) we have Wc^ G C[a, 1] w.p. 1. Thus, since S^^^/N = 
iPa{c{h, N)fh]\f{o)), the continuous mapping theorem yields 

^^."V^' ^a{W^{o)) = inf{a < s < 1 : W^{s) > c}, 
as — 7- oo. Notice that 

inf{a < s < 1 : IV^(s) > c} > x <^ sup W(;{s) < c. 

0<s<x 

By a.s. continuity of W,^, Theorem 2 of Lifshits (1982) ensures that = £(suPq<^<2, W(^{s)) 
can have an atom only at the point 

7^ = sup EW^{t), 

0<t<x:Var{WQ{t))=0 

vanishes on (—00,72:), and is absolutely continuous on (73,, 00). Since Var(VF^(s)) > if 
s > 0, is absolutely continuous. Therefore, we obtain convergence in distribution, i.e., 

P(inf{a < s < 1 : c{h, A^)mjv(s) > c} < x) ^ P{mi{a < s < 1 : ly^(s) > c}, 

as — 7- 00, for all x. 

Proof (of Theorem \6. Using standard arguments of functional optimization theory, we 
see that S*{C, \ K; rrio) is minimized w.r.t. K if 

(10) t{K)= / K{C{r-s*)) / mo{t)dtdr / / K{C{r-s*))dr 

Jo Jo Jo 

is maximized w.r.t. K ^ IC, where s* = S*{(, K*;mo) denotes the optimal asymptotic 

normed delay (c.f. Steland (2004b)). Clearly, t{K) > is less than or equal to 



K{C{r-s*)Ydr\l I [I mo{t) dt ) dr / I K{C{r-s*))dr 



with equality if and only if 

KiCir-s*)) 



/; KiC{r-s*))dr 
for some A. Using K{({r — s*)) dr = 1/2 gives 



A / mo(t) dt, r E [0, s* 
Jo 



X-' = 2j J moit)dtdr I K{C{r-s*))dr, 



i.e., the optimal (symmetric) kernel K* satisfies 

(11) ^*(C(r-.*))= _ re[-s*,s*]. 

2 Jo Jomo{t)dt 
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Consequently, using K{({r — s*)) = K{({s* — r)) and substituting z = (s* — (r gives the 
representation in the theorem for z G [— Plugging in K* as given in f lTT]) in f lTU]) 
yields immediately 

^ Jf K*ias*-r))j;mo{t)dtdr ^ j{ {j^ mojt) dt)' dr 
ffK*{as*-r))dr /;7;mo(t)rftrfr ' 

Therefore, the assertion for the optimal asymptotic normed delay follows. 
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